164 research outputs found

    Case-based User Profiling in a Personal Travel Assistant

    Full text link

    Determining appropriate approaches for using data in feature selection

    Get PDF
    Feature selection is increasingly important in data analysis and machine learning in big data era. However, how to use the data in feature selection, i.e. using either ALL or PART of a dataset, has become a serious and tricky issue. Whilst the conventional practice of using all the data in feature selection may lead to selection bias, using part of the data may, on the other hand, lead to underestimating the relevant features under some conditions. This paper investigates these two strategies systematically in terms of reliability and effectiveness, and then determines their suitability for datasets with different characteristics. The reliability is measured by the Average Tanimoto Index and the Inter-method Average Tanimoto Index, and the effectiveness is measured by the mean generalisation accuracy of classification. The computational experiments are carried out on ten real-world benchmark datasets and fourteen synthetic datasets. The synthetic datasets are generated with a pre-set number of relevant features and varied numbers of irrelevant features and instances, and added with different levels of noise. The results indicate that the PART approach is more effective in reducing the bias when the size of a dataset is small but starts to lose its advantage as the dataset size increases

    Seir immune strategy for instance weighted naive bayes classification

    Full text link
    © Springer International Publishing Switzerland 2015. Naive Bayes (NB) has been popularly applied in many classification tasks. However, in real-world applications, the pronounced advantage of NB is often challenged by insufficient training samples. Specifically, the high variance may occur with respect to the limited number of training samples. The estimated class distribution of a NB classier is inaccurate if the number of training instances is small. To handle this issue, in this paper, we proposed a SEIR (Susceptible, Exposed, Infectious and Recovered) immune-strategy-based instance weighting algorithm for naive Bayes classification, namely SWNB. The immune instance weighting allows the SWNB algorithm adjust itself to the data without explicit specification of functional or distributional forms of the underlying model. Experiments and comparisons on 20 benchmark datasets demonstrated that the proposed SWNB algorithm outperformed existing state-of-the-art instance weighted NB algorithm and other related computational intelligence methods

    A Comparison of Machine Learning Methods for Cross-Domain Few-Shot Learning

    Get PDF
    We present an empirical evaluation of machine learning algorithms in cross-domain few-shot learning based on a fixed pre-trained feature extractor. Experiments were performed in five target domains (CropDisease, EuroSAT, Food101, ISIC and ChestX) and using two feature extractors: a ResNet10 model trained on a subset of ImageNet known as miniImageNet and a ResNet152 model trained on the ILSVRC 2012 subset of ImageNet. Commonly used machine learning algorithms including logistic regression, support vector machines, random forests, nearest neighbour classification, naïve Bayes, and linear and quadratic discriminant analysis were evaluated on the extracted feature vectors. We also evaluated classification accuracy when subjecting the feature vectors to normalisation using p-norms. Algorithms originally developed for the classification of gene expression data—the nearest shrunken centroid algorithm and LDA ensembles obtained with random projections—were also included in the experiments, in addition to a cosine similarity classifier that has recently proved popular in few-shot learning. The results enable us to identify algorithms, normalisation methods and pre-trained feature extractors that perform well in cross-domain few-shot learning. We show that the cosine similarity classifier and ℓ² -regularised 1-vs-rest logistic regression are generally the best-performing algorithms. We also show that algorithms such as LDA yield consistently higher accuracy when applied to ℓ² -normalised feature vectors. In addition, all classifiers generally perform better when extracting feature vectors using the ResNet152 model instead of the ResNet10 model

    Detecting Machine-obfuscated Plagiarism

    Full text link
    Related dataset is at https://doi.org/10.7302/bewj-qx93 and also listed in the dc.relation field of the full item record.Research on academic integrity has identified online paraphrasing tools as a severe threat to the effectiveness of plagiarism detection systems. To enable the automated identification of machine-paraphrased text, we make three contributions. First, we evaluate the effectiveness of six prominent word embedding models in combination with five classifiers for distinguishing human-written from machine-paraphrased text. The best performing classification approach achieves an accuracy of 99.0% for documents and 83.4% for paragraphs. Second, we show that the best approach outperforms human experts and established plagiarism detection systems for these classification tasks. Third, we provide a Web application that uses the best performing classification approach to indicate whether a text underwent machine-paraphrasing. The data and code of our study are openly available.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/152346/1/Foltynek2020_Paraphrase_Detection.pdfDescription of Foltynek2020_Paraphrase_Detection.pdf : Foltynek2020_Paraphrase_Detectio

    Cross validation of bi-modal health-related stress assessment

    Get PDF
    This study explores the feasibility of objective and ubiquitous stress assessment. 25 post-traumatic stress disorder patients participated in a controlled storytelling (ST) study and an ecologically valid reliving (RL) study. The two studies were meant to represent an early and a late therapy session, and each consisted of a "happy" and a "stress triggering" part. Two instruments were chosen to assess the stress level of the patients at various point in time during therapy: (i) speech, used as an objective and ubiquitous stress indicator and (ii) the subjective unit of distress (SUD), a clinically validated Likert scale. In total, 13 statistical parameters were derived from each of five speech features: amplitude, zero-crossings, power, high-frequency power, and pitch. To model the emotional state of the patients, 28 parameters were selected from this set by means of a linear regression model and, subsequently, compressed into 11 principal components. The SUD and speech model were cross-validated, using 3 machine learning algorithms. Between 90% (2 SUD levels) and 39% (10 SUD levels) correct classification was achieved. The two sessions could be discriminated in 89% (for ST) and 77% (for RL) of the cases. This report fills a gap between laboratory and clinical studies, and its results emphasize the usefulness of Computer Aided Diagnostics (CAD) for mental health care

    Fire detection from social media images by means of instance-based learning

    Get PDF
    Social media can provide valuable information to support decision making in crisis management, such as in accidents, explosions, and fires. However, much of the data from social media are images, which are uploaded at a rate that makes it impossible for human beings to analyze them. To cope with that problem, we design and implement a database-driven architecture for fast and accurate fire detection named FFireDt. The design of FFireDt uses the instance-based learning through indexed similarity queries expressed as an extension of the relational Structured Query Language. Our contributions are: (i) the design of the Fast-Fire Detection (FFireDt), which achieves efficiency and efficacy rates that rival to the state-of-the-art techniques; (ii) the sound evaluation of 36 image descriptors, for the task of image classification in social media; (iii) the evaluation of content-based indexing with respect to the construction of instance-based classification systems; and (iv) the curation of a ground-truth annotated dataset of fire images from social media. Using real data from Flickr, the experiments showed that system FFireDt was able to achieve a precision for fire detection comparable to that of human annotators. Our results are promising for the engineering of systems to monitor images uploaded to social media services.FAPESPCNPqCAPESSTIC-AmSudRESCUER project, funded by the European Commission (Grant: 614154) and by the CNPq/MCTI (Grant: 490084/2013-3)International Conference on Enterprise Information Systems - ICEIS (17. 2015 Barcelona

    Fire detection from social media images by means of instance-based learning

    Get PDF
    Social media can provide valuable information to support decision making in crisis management, such as in accidents, explosions, and fires. However, much of the data from social media are images, which are uploaded at a rate that makes it impossible for human beings to analyze them. To cope with that problem, we design and implement a database-driven architecture for fast and accurate fire detection named FFireDt. The design of FFireDt uses the instance-based learning through indexed similarity queries expressed as an extension of the relational Structured Query Language. Our contributions are: (i) the design of the Fast-Fire Detection (FFireDt), which achieves efficiency and efficacy rates that rival to the state-of-the-art techniques; (ii) the sound evaluation of 36 image descriptors, for the task of image classification in social media; (iii) the evaluation of content-based indexing with respect to the construction of instance-based classification systems; and (iv) the curation of a ground-truth annotated dataset of fire images from social media. Using real data from Flickr, the experiments showed that system FFireDt was able to achieve a precision for fire detection comparable to that of human annotators. Our results are promising for the engineering of systems to monitor images uploaded to social media services.FAPESPCNPqCAPESSTIC-AmSudRESCUER project, funded by the European Commission (Grant: 614154) and by the CNPq/MCTI (Grant: 490084/2013-3)International Conference on Enterprise Information Systems - ICEIS (17. 2015 Barcelona
    corecore